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SEQUENCE LISTING 

5 (1) GENERAL INFORMATION: 

7 (i) APPLICANT: Choi et . al . 

9 (ii) TITLE OF INVENTION: Streptococcus pneumoniae Antigens and Vaccines 

11 (iii) NUMBER OF SEQUENCES : 4 52 . 

13 (iv) CORRESPONDENCE ADDRESS: 

15 (A) ADDRESSEE: Human Genome Sciences, Inc. 

16 (B) STREET: 9410 Key West Avenue 

17 (C) CITY: Rockville 

18 (D) STATE: Maryland 

19 (E) COUNTRY: USA 

20 (F) ZIP: 20850 

2 3 (v) COMPUTER READABLE FORM: 

25 (A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

26 (B) COMPUTER: HP Vectra 486/33 

27 (C) OPERATING SYSTEM: MSDOS version 6.2 

2 8 (D) SOFTWARE: ASCII Text 
31 (vi) CURRENT APPLICATION DATA: 

C--> 33 (A) APPLICATION NUMBER: US/09/765,271 

C--> 34 (B) FILING DATE: 22-Jan-2001 

35 (C) CLASSIFICATION: 

3 8 (vii) PRIOR APPLICATION DATA: 

40 (A) APPLICATION NUMBER: 09/536,784 

41 (B) FILING DATE: 

43 (A) APPLICATION NUMBER: 08/961,083 

44 (B) FILING DATE: OCT-30-1997 

4 7 (viii) ATTORNEY/AGENT INFORMATION: 

49 (A) NAME: Michelle S. Marks 

50 (B) REGISTRATION NUMBER: 41,971 

51 (C) REFERENCE/DOCKET NUMBER: PB340P3 
C--> 54 (ix) TELECOMMUNICATION INFORMATION: 

56 (A) TELEPHONE: (301) 309-8504 

57 (B) TELEFAX: (301) 309~8512 
60 (2) INFORMATION FOR SEQ ID NO: 1: 

62 (i) SEQUENCE CHARACTERISTICS: 

63 (A) LENGTH: 1999 base pairs 

64 (B) TYPE: nucleic acid 

65 (C) STRANDEDNESS : double- 
6 6 (D) TOPOLOGY: linear 

69 . (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



71 TAAAATCTAC GACAATAAAA ATCAACTCAT TGCTGACTTG GGTTCTGAAC GCCGCGTCAA 60 

7 3 TGCCCAAGCT AATGATATTC CCACAGATTT GGTTAAGGCA ATCGTTTCTA TCGAAGACCA 120 
75 TCGCTTCTTC GACCACAGGG GGATTGATAC CATCCGTATC CTGGGAGCTT TCTTGCGCAA 180 
77 TCTGCAAAGC AATTCCCTCC AAGGTGGATC AACTCTCACC CAACAGTTGA TTAAGTTGAC 240 
79 TTACTTTTCA ACTTCGACTT CCGACCAGAC TATTTCTCGT AAGGCTCAGG AAGCTTGGTT 300 
81 AGCGATTCAG TTAGAACAAA AAGCAACCAA GCAAGAAATC TTGACCTACT ATATAAATAA 3 60 

8 3 GGTCTACATG TCTAATGGGA ACTATGGAAT GCAGACAGCA GCTCAAAACT ACTATGGTAA 4 20 



ENTERED 
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85 AGACCTCAAT AATTTAAGTT TACCTCAGTT AGCCTTGCTG GCTGGAATGC CTCAGGCACC 4 80 

87 AAACCAATAT GACCCCTATT CACATCCAGA AGCAGCCCAA GACCGCCGAA ACTTGGTCTT 54 0 

89 ATCTGAAATG A A A A AT C A AG GCTACATCTC TGCTGAACAG TATGAGAAAG CAGTCAATAC 600 

91 ACCAATTACT GATGGACTAC AAAGTCTCAA ATCAGCAAGT AATTACCCTG CTTACATGGA 6 60 

93 TAATTACCTC AAGGAAGTCA TCAATCAAGT TGAAGAAGAA ACAGGCTATA ACCTACTCAC 720 

95 AACTGGGATG GATGTCTACA CAAATGTAGA CCAAGAAGCT CAAAAACATC TGTGGGATAT 7 80 

97 TTACAATACA GACGAATACG TTGCCTATCC AGACGATGAA TTGCAAGTCG CTTCTACCAT 84 0 

99 TGTTGATGTT TCTAACGGTA AAGTCATTGC CCAGCTAGGA GCACGCCATC AGTCAAGTAA 900 

101 TGTTTCCTTC GGAATTAACC AAGCAGTAGA AACAAACCGC GACTGGGGAT CAACTATGAA 960 

103 ACCGATCACA GACTATGCTC CTGCCTTGGA GTACGGTGTC TACGATTCAA CTGCTACTAT 1020 

10 5 CGTTCACGAT GAGCCCTATA ACTACCCTGG GACAAATACT CCTGTTTATA ACTGGGATAG 1080 

107 GGGCTACTTT GGCAACATCA CCTTGCAATA CGCCCTGCAA CAATCGCGAA ACGTCCCAGC 114 0 

109 CGTGGAAACT CTAAACAAGG TCGGACTCAA CCGCGCCAAG ACTTTCCTAA ATGGTCTAGG 1200 

111 AATCGACTAC CCAAGTATTC ACTACTCAAA TGCCATTTCA AGTAACACAA CCGAATCAGA 1260 

113 CAAAAAATAT GGAGCAAGTA GTGAAAAGAT GGCTGCTGCT TACGCTGCCT TTGCAAATGG 1320 

115 TGGAACTTAC TATAAACCAA TGTATATCCA TAAAGTCGTC TTTAGTGATG GGAGTGAAAA 13 80 

117 AGAGTTCTCT AATGTCGGAA CTCGTGCCAT GAAGGAAACG ACAGCCTATA TGATGACCGA 14 40 

119 CATGATGAAA ACAGTCTTGA CTTATGGAAC TGGACGAAAT GCCTATCTTG CTTGGCTCCC 1500 

121 TCAGGCTGGT AAAACAGGAA CCTCTAACTA TACAGACGAG GAAATTGAAA ACCACATCAA 1560 

123 GACCTCTCAA TTTGTAGCAC CTGATGAACT ATTTGCTGGC TATACGCGTA AATATTCAAT 1620 

12 5 GGCTGTATGG ACAGGCTATT CTAACCGTCl' GACACCACTT GTAGGCAATG GCCTTACGGT 168 0 
127 CGCTGCCAAA GTTTACCGCT CTATGATGAC CTACCTGTCT GAAGGAAGCA ATCCAGAAGA 1740 
129 TTGGAATATA CCAGAGGGGC TCTACAGAAA TGGAGAATTC GTATTTAAAA ATGGTGCTCG 1800 
131 TTCTACGTGG AACTCACCTG CTCCACAACA ACCCCCATCA ACTGAAAGTT CAAGCTCATC 1860 

13 3 ATCAGATAGT TCAACTTCAC AGTCTAGCTC AACCACTCCA AGCACAAATA ATAGTACGAC 1920 
13 5 TACCAATCCT AACAATAATA CGCAACAATC AAATACAACC CCTGATCAAC AAAATCAGAA 1980 
13 7 TCCTCAACCA GCACAACCA 1999 
139 (2) INFORMATION FOR SEQ ID NO: 2: 



141 (i) SEQUENCE CHARACTERISTICS: 

142 (A) LENGTH: 666 amino acids 

143 (B) TYPE: amino acid 

144 (C) STRANDEDNESS: single 

145 (D) TOPOLOGY: linear 
14 7 (ii) MOLECULE TYPE: protein 

150 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

152 Lys lie Tyr Asp Asn'Lys Asn Gin Leu lie Ala Asp Leu Gly Ser Glu 

153 15 10 15 

155 Arg Arg Val Asn Ala Gin Ala Asn Asp lie Pro Thr Asp Leu Val Lys 

156 20 25 30 

158 Ala lie Val Ser lie Glu Asp His- Arg Phe Phe Asp His Arg Gly lie 

159 35 40 45 

161 Asp Thr lie Arg lie Leu Gly Ala Phe Leu Arg Asn Leu Gin Ser Asn 

162 50 55 60 

164 Ser Leu Gin Gly Gly Ser Thr Leu Thr Gin Gin Leu lie Lys Leu Thr 

165 65 70 75 80 

167 Tyr Phe Ser Thr Ser Thr Ser Asp Gin Thr lie Ser Arg Lys Ala Gin 

168 85 90 95 

170 Glu Ala Trp Leu Ala lie Gin Leu Glu Gin Lys Ala Thr Lys Gin Glu 

171 100 105 110 
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246 500 505 510 

248 Glu Glu lie Glu Asn His lie Lys Thr Ser Gin Phe Val Ala Pro Asp 

249 515 520 ' 525 

251 Glu Leu Phe Ala Gly Tyr Thr Arg Lys .Tyr Ser Met Ala Val Trp Thr 

252 530 535 540 

254 Gly Tyr Ser Asn Arg Leu Thr Pro Leu Val Gly Asn Gly Leu Thr Val 

255 545 * 550 555 560 
2 57 Ala Ala Lys Val Tyr Arg Ser Met Met Thr Tyr Leu Ser Glu Gly Ser 
258 565 570 575 

260 Asn Pro Glu Asp Trp Asn lie Pro Glu Gly Leu Tyr Arg Asn Gly Glu 

261 580 585 590 

263 Phe Val Phe Lys Asn Gly Ala Arg Ser Thr Trp Asn Ser Pro Ala Pro 

264 595 600 605 

266 Gin Gin Pro Pro Ser Thr Glu Ser Ser Ser Ser Ser Ser Asp Ser Ser 

267 610 615 620 

269 Thr Ser Gin Ser Ser Ser Thr Thr Pro Ser Thr Asn Asn Ser Thr Thr 

270 625 630 ' 635 640 

272 Thr Asn Pro Asn Asn Asn Thr Gin Gin Ser Asn Thr Thr Pro Asp Gin 

273 645 650 655 

275 Gin Asn Gin Asn Pro Gin Pro Ala Gin Pro 

276 660 - 665 
278 (2) INFORMATION FOR SEQ ID NO: 3: 

280 (i) SEQUENCE CHARACTERISTICS: 

281 (A) LENGTH: 1714 base pairs 

282 (B) TYPE: nucleic acid 

283 (C) STRANDEDNESS: double 

284 (D) TOPOLOGY: linear 

288 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: 

2 90 AAATTACAAT ACGGACTATG AATTGACCTC TGGAGAAAAA TTACCTCTTC CTAAAGAGAT 60 
2 92 TTCAGGTTAC ACTTATATTG GATATATCAA AGAGGGAAAA ACGACTTCTG AGTCTGAAGT 120 
2 94 AAGTAATCAA AAGAGTTCAG TTGCCACTCC TACAAAACAA CAAAAGGTGG ATTATAATGT 180 

2 96 TACACCGAAT TTTGTAGACC ATCCATCAAC AGTACAAGCT ATTCAGGAAC AAACACCTGT 24 0 
298 TTCTTCAACT AAGCCGACAG AAGTTCAAGT AGTTGAAAAA CCTTTCTCTA CTGAATTAAT 3 00 

3 00 CAATCCAAGA AAAGAAGAGA AACAATCTTC AGATTCTCAA GAACAATTAG CCGAACATAA 3 60 
3 02 GAATCTAGAA ACGAAGAAAG AGGAGAAGAT TTCTCCAAAA GAAAAGACTG GGGTAAATAC 4 20 
304 ATTAAATCCA CAGGATGAAG TTTTATCAGG TCAATTGAAC AAACCTGAAC TCTTATATCG 4 80 
306 TGAGGAAACT ATGGAGACAA AAATAGATTT TCAAGAAGAA ATTCAAGAAA ATCCTGATTT 540 
308 AGCTGAAGGA ACTGTAAGAG TAAAACAAGA AGGTAAATTA GGTAAGAAAG TTGAAATCGT 600 
310 CAGAATATTC TCTGTAAACA AGGAAGAAGT TTCGCGAGAA ATTGTTTCAA CTTCAACGAC 660 
312 TGCGCCTAGT CCAAGAATAG TCGAAAAAGG TACTAAAAAA ACTCAAGTTA TAAAGGAACA 720 
314 ACCTGAGACT GGTGTAGAAC ATAAGGACGT ACAGTCTGGA GCTATTGTTG AACCCGCAAT 7 80 
316 TCAGCCTGAG TTGCCCGAAG CTGTAGTAAG TGACAAAGGC GAACCAGAAG TTCAACCTAC 840 
318 ATTACCCGAA GCAGTTGTGA CCGACAAAGG TGAGACTGAG GTTCAACCAG AGTCGCCAGA 900 
3 20 TACTGTGGTA AGTGATAAAG GTGAACCAGA GCAGGTAGCA CCGCTTCCAG AATATAAGGG 960 
322 TAATATTGAG CAAGTAAAAC CTGAAACTCC GGTTGAGAAG ACCAAAGAAC AAGGTCCAGA 1020 
324 AAAAACTGAA GAAGTTCCAG TAAAACCAAC AGAAGAAACA CCAGTAAATC CAAATGAAGG 10 80 
3 26 TACTACAGAA GGAACCTCAA TTCAAGAAGC AGAAAATCCA GTTCAACCTG CAGAAGAATC 1140 
328 AACAACGAAT TCAGAGAAAG TATCACCAGA TACATCTAGC AAAAATACTG GGGAAGTGTC 1200 
330 CAGTAATCCT AGTGATTCGA CAACCTCAGT TGGAGAATCA AATAAACCAG AACATAATGA 1260 
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3 32 CTCTAAAAAT GAAAATTCAG AAAAAACTGT AGAAGAAGTT CCAGTAAATC CAAATGAAGG 13 20 

3 34 CACAGTAGAA GGTACCTCAA ATCAAGAAAC AGAAAAACCA GTTCAACCTG CAGAAGAAAC 13 80 

336 ACAAACAAAC TCTGGGAAAA TAGCTAACGA AAATACTGGA GAAGTATCCA ATAAACCTAG 14 4 0 

338 TGATTCAAAA CCACCAGTTG AAGAATCAAA TCAACCAGAA AAAAACGGAA CTGCAACAAA 1500 

340 ACCAGAAAAT TCAGGTAATA CAACATCAGA GAATGGACAA ACAGAACCAG AACCATCAAA 15 60 

342 CGGAAATTCA ACTGAGGATG TTTCAACCGA ATCAAACACA TCCAATTCAA ATGGAAACGA 162 0 

344 AGAAATTAAA CAAGAAAATG AACTAGACCC TGATAAAAAG GTAGAAGAAC CAGAGAAAAC 1680 

34 6 ACTTGAATTA AGAAATGTTT CCGACCTAGA GTTA 1714 
348 (2) INFORMATION FOR SEQ ID NO: 4: 

350 (i) SEQUENCE CHARACTERISTICS : 

351 (A) LENGTH: 571 amino acids 

352 (B) TYPE: amino acid 

353 (C) STRANDEDNESS: single 

354 (D) TOPOLOGY: linear 
356 (ii) MOLECULE TYPE: protein 

359 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

361 Asn Tyr Asn Thr Asp Tyr Glu Leu Thr Ser Gly Glu Lys Leu Pro Leu 

362 1 5 10 15 

364 Pro Lys Glu He Ser Gly Tyr Thr Tyr He Gly Tyr He Lys Glu Gly 

365 20 25 30 

367 Lys Thr Thr Ser Glu Ser Glu Val Ser Asn Gin Lys Ser Ser Val Ala 

368 35 40 45 

370 Thr Pro Thr Lys Gin Gin Lys Val Asp Tyr Asn Val Thr Pro Asn Phe 

371 50 55 60 

373 Val Asp His Pro Ser Thr Val Gin Ala He Gin Glu Gin Thr Pro Val 

374 65 70 75 80 

376 Ser Ser Thr Lys Pro Thr Glu Val Gin Val Val Glu Lys Pro Phe Ser 

377 85 90 95 

379 Thr Glu Leu He Asn Pro Arg Lys Glu Glu Lys Gin Ser Ser Asp Ser 

380 . 100 105 HO 

382 Gin Glu Gin Leu Ala Glu His Lys Asn Leu Glu Thr Lys Lys Glu Glu 

383 115 120 125 

385 Lys He Ser Pro Lys Glu Lys Thr Gly Val Asn Thr Leu Asn Pro Gin 

386 130 135 140 

388 Asp Glu Val Leu Ser Gly Gin Leu Asn Lys Pro Glu Leu Leu Tyr Arg 

389 145 150 . 155 160 

391 Glu Glu Thr Met Glu Thr Lys He Asp Phe Gin Glu Glu He Gin Glu 

392 165 170 175 

394 Asn Pro Asp Leu Ala Glu Gly Thr Val Arg Val Lys Gin Glu Gly Lys 

395 180 185 190 

397 Leu Gly Lys Lys Val Glu He Val Arg He Phe Ser Val Asn Lys Glu 

398 195 200 205 

400 Glu Val Ser Arg Glu He Val Ser Thr Ser Thr Thr Ala Pro Ser Pro 

401 210 215 220 



Arg He Val Glu Lys Gly Thr Lys Lys Thr Gin Val He Lys Glu Gin 



403 

404 225 230 " " 235 

406 



240 



Pro Glu Thr Gly Val Glu His Lys Asp Val Gin Ser. Gly Ala He Val 



407 245 250 



409 



255 



Glu Pro* Ala He Gin Pro Glu Leu Pro Glu Ala Val Val Ser Asp Lys 
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L 
L 
L 
L 
L 
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M 
L 
L 
M 
L 
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L 
L 
L 
L 
L 
M 
L 
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L 
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M: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 



33 M:220 C 

34 M:220 C 
54 M:220 C 
815 M:341 W: 
2577 M:lll C 
111 Repeated 
2809 M:lll C 
111 Repeated in 
2849 M:341 W 
2906 M:lll C 
111 Repeated 
3236 M:341 W 
3551 M:341 W 



Input Set : 
Output Set: 

Keyword misspelled 
Keyword misspelled 
Keyword misspelled 
(46) "n" or "Xaa" 
47) String data 

SeqNo=41 
47) String data 
SeqNo=4 5 

46) "n" or "Xaa' 

47) String data 
SeqNo-47 



3733 
3736 



:341 W 
:341 W 



4217 M:341 W 



4295 
4858 
4994 



:341 W 
:341 W 
:111 C 



:111 Repeated 
:5062 M:341 W 
: 5251 M: 341 W 
: 5284 M: 341 W 
:5417 M:341 W 
: 6864 M: 341 W 
:6867 M:341 W 
:6879 M:341 W 
: 6894 M:341 W 
: 6935 M:lll C 
:111 Repeated 
:7466 M:lll C 
:7612 M:341 W 
:7792 M:lll C 
: 7886 M:341 W 
10026 M: 341 W 
10147 M: 341 W 
10603 M: 341 W 
10606 M: 341 W 
10609 M: 341 W 
10829 M:341 W 
11704 M:341 W 
13925 M: 111 C 



N:\paola\09 7 6 5271.txt 
N:\CRF3\0316 2001\I765271 .raw 

or invalid format/ [(A) APPLICATION NUMBER:] 
or invalid format/ [(B) FILING DATE:] 

or invalid format/ [(ix) TELECOMMUNICATION INFORMATION: ] 
used, for SEQ ID#:10 
converted to upper case, 



converted to upper case, 

used/ for SEQ ID#:46 
converted to upper ca.se, 



46) 
46) 
46) 
46) 
46) 
46) 
46) 
47) 



"n" or 
"n" 
"n" 
"n ,r 
"n" 
n n" 
"n" 



or 
or 
or 
or 
or 
or 



"Xaa" used/ for SEQ ID#:52 

"Xaa" used/ for SEQ ID#:56 

"Xaa" used/ for SEQ ID#:58 

"Xaa" used, for SEQ ID#:58 

"Xaa" used/ for SEQ ID#:66 

"Xaa" used/ for SEQ ID#:66 

"Xaa" used/ for SEQ ID#:74 



converted to upper case, 



String data 
. SeqNo=7 5 

46) "n" or "Xaa" used, for SEQ ID#:76 

46) "n" or "Xaa" used, for SEQ ID#:80 

46) "n" or "Xaa" used, for SEQ ID#:80 

46). "n" or "Xaa" used/ for SEQ ID#:82 

46) "n" or "Xaa" used/ for SEQ ID#:106 

46) "n" or "Xaa" used/ for SEQ ID#:106 

46) "n" or "Xaa" used/ for SEQ ID#:106 

46) "n" or "Xaa" used/ for SEQ ID# : 106 
47) 



String data 
SeqNo=107 
47) String data 

46) "n" or "Xaa" 

47) String data 
46) "n" or "Xaa" 
(46) "n" or 
(46) "n 
(46) "n 
(46) "n 
(46) "n 
(46) "n 
(46) " n 



or 
or 



"Xaa 
"Xaa 
"Xaa 
or "Xaa 
or "Xaa 
or "Xaa 
or "Xaa 



converted to upper case, 

converted to upper case, 

used, for SEQ ID#:118 
converted to upper case, 

used, for SEQ ID# : 120 
" used, for SEQ ID#:160 
" used, for SEQ ID#:162 
" used/ for SEQ ID#:17 2 
used/ for SEQ ID#:172 
used, for SEQ ID#:172 
used, for SEQ ID#:176 
used, for SEQ ID#:194 

case, 



(47) String data converted to upper 
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